Um, one large pizza. A preliminary study of disfluency modelling for improving ASR

نویسندگان

  • Ben Hutchinson
  • Cecile Pereira
چکیده

A corpus of spontaneous telephone transactions between call centre operators of a pizza company and its customers is examined for disfluencies (fillers and speech repairs) with the aim of improving automatic speech recognition. From this, a subset of the customer orders is selected as a test set. An architecture is presented which allows filled pauses and repairs to be detected and corrected. A language repair module removes fillers and reparanda and transforms utterances containing them into fluent utterances. An experiment on filled pauses using this module and architecture is then described. A speech recognition grammar for recognising fluent speech is used to provide a baseline. This grammar is then enriched with filled pauses, based on their placement in relation to syntactic boundaries. Evaluation is done at the level of understanding, using a metric on feature structures. Initial results indicate that incorporating filled pauses at syntactic boundaries improves the recognition results for spontaneous continuous speech containing disfluencies.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Phonetic evidence for two types of disfluency

Disfluency, such as pause (silences), filled pause (e.g., ‘um’, ‘uh’), repetition (e.g., ‘the the’) and cutoff word (e.g., ‘hori[zontal]-’), is a common part of human speech that occurs at a rate of 6 to 10 per 100 words [2, 5]. According to one model of speech production [8], there are two types of disfluency: disfluency at the internal planning stage (e.g., wordretrieval difficulties), and di...

متن کامل

Joint Transition-based Dependency Parsing and Disfluency Detection for Automatic Speech Recognition Texts

Joint dependency parsing with disfluency detection is an important task in speech language processing. Recent methods show high performance for this task, although most authors make the unrealistic assumption that input texts are transcribed by human annotators. In real-world applications, the input text is typically the output of an automatic speech recognition (ASR) system, which implies that...

متن کامل

Tight Integration of Speech Disfluency Removal into SMT

Speech disfluencies are one of the main challenges of spoken language processing. Conventional disfluency detection systems deploy a hard decision, which can have a negative influence on subsequent applications such as machine translation. In this paper we suggest a novel approach in which disfluency detection is integrated into the translation process. We train a CRF model to obtain a disfluen...

متن کامل

Prosodic parallelism as a cue to repetition and error correction disfluency

Complex disfluencies that involve the repetition or correction of words are frequent in conversational speech, with repetition disfluencies alone accounting for over 20% of disfluencies. These disfluencies generally do not lead to comprehension errors for human listeners. We propose that the frequent occurrence of parallel prosodic features in the reparandum (REP) and alteration (ALT) intervals...

متن کامل

Prosodic parallelism as a cue to repetition disfluency

Repetition disfluencies are among the most frequent type of disfluency in conversational speech, accounting for over 20% of disfluencies, yet they do not generally lead to comprehension errors for human listeners. We propose that parallel prosodic features in the REP and ALT intervals of the repetition disfluency provide strong perceptual cues that signal the repetition to the listener. We repo...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2001